pacman::p_load(sf, raster, spatstat, tmap, tidyverse)Hands-on Exercise 2.1: 1st Order Spatial Point Patterns Analysis Methods
1 Overview
Spatial Point Pattern Analysis examines the pattern or distribution of points on a surface. These points can represent locations of events such as crimes, traffic accidents, or disease outbreaks, as well as business services (like coffee shops and fast food outlets) or facilities like childcare and eldercare centers.
In this hands-on exercise, we will use functions from the spatstat package to explore the spatial distribution of childcare centers in Singapore.
The key questions we aim to answer are:
- Are the childcare centers in Singapore randomly distributed across the country?
- If not, where are the areas with a higher concentration of childcare centers?
2 Data Acquisition
The datasets required for this exercise are extracted from the below public data sources:
CHILDCARE: A point feature dataset that provides both location and attribute information of childcare centers. This dataset was downloaded from data.gov.sg in GeoJSON format.MP14_SUBZONE_WEB_PL: A polygon feature dataset that contains information on the URA 2014 Master Plan Planning Subzone boundaries. This dataset is in ESRI Shapefile format and was also downloaded from data.gov.sg.CostalOutline: A polygon feature dataset that shows the national boundary of Singapore. This dataset is provided by the Singapore Land Authority (SLA) and is in ESRI Shapefile format.
3 Import R Packages
p_load() of pacman package is used to install and load sf, tmap, tidyverse, spatstat and raster packages into R environment.
4 Spatial Data Wrangling
4.1 Import Spatial Data
First we use st_read() of sf package used to import these three geospatial data sets into R.
As the childcare_sf simple feature data frame is in wgs84 geodetic CRS, which is not suitable for geospatial analysis, st_transform() of sf package is used to reproject the data frame to svy21 at the same time of import using below code chunk.
childcare_sf <- st_read("data/child-care-services-geojson.geojson") %>%
st_transform(crs = 3414)Reading layer `child-care-services-geojson' from data source
`C:\thuphuong1110\ISSS626-GAA\Hands-on_Ex\Hands-on_Ex02\data\child-care-services-geojson.geojson'
using driver `GeoJSON'
Simple feature collection with 1545 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6824 ymin: 1.248403 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
We re-check the crs using below code chunk. The EPSG already reflects 3414 as expected.
st_crs(childcare_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
Geospatial data is imported using below code chunk.
sg_sf <- st_read(dsn = "data", layer = "CostalOutline")Reading layer `CostalOutline' from data source
`C:\thuphuong1110\ISSS626-GAA\Hands-on_Ex\Hands-on_Ex02\data'
using driver `ESRI Shapefile'
Simple feature collection with 60 features and 4 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 2663.926 ymin: 16357.98 xmax: 56047.79 ymax: 50244.03
Projected CRS: SVY21
Check the predefined coordinate system of this simple feature data frame using st_crs() of sf package.
st_crs(sg_sf)Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
The last lines of the print shows that EPSG code 9001 is used instead of the correct EPSG code 3414 for coordinate reference system svy21. The correct EPSG code is assigned using st_set_crs() as below.
sg_sf = st_set_crs(sg_sf, 3414)Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
that
We check the CRS again.
st_crs(sg_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
The EPSG code is now 3414.
mpsz_sf <- st_read(dsn = "data",
layer = "MP14_SUBZONE_WEB_PL")Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\thuphuong1110\ISSS626-GAA\Hands-on_Ex\Hands-on_Ex02\data'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
First, we check the predefined coordinate system of mpsz_sf simple feature data frame using st_crs().
st_crs(mpsz_sf)Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
Output interpretation: The last lines of the print shows that EPSG code 9001 is used instead of the correct EPSG code 3414 for coordinate reference system svy21. The correct EPSG code is assigned to mpsz_sf data frame using st_set_crs() as below.
mpsz_sf <- st_set_crs(mpsz_sf,3414)We check the CRS again.
st_crs(mpsz_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
The EPSG code is now 3414.
4.2 Plot the Map from geospatial data sets
After verifying the coordinate reference system (CRS) of each geospatial dataset, it is helpful to plot a map to visualize their spatial patterns.
tm_shape(mpsz_sf) +
tm_polygons() +
tm_shape(childcare_sf) +
tm_dots()
Notice that all the geospatial layers share the same map extent, indicating that their coordinate reference systems and values are aligned to the same spatial context. This alignment is crucial for any geospatial analysis.
Alternatively, we can create a pin map using the code snippet below.
tmap_mode('view')tmap mode set to interactive viewing
tm_shape(childcare_sf)+
tm_dots()tmap_mode('plot')tmap mode set to plotting
In interactive mode, tmap uses the Leaflet for R API. The benefit of this interactive pin map is that it allows us to freely navigate and zoom in or out. Additionally, we can click on each point to query detailed information about that feature. Three background options of the online map layer are currently available: ESRI.WorldGrayCanvas, OpenStreetMap, and ESRI.WorldTopoMap, with ESRI.WorldGrayCanvas set as the default.
Note: Always switch back to plot mode after using the interactive map, as each interactive session consumes a connection. Additionally, to prevent issues when publishing on Netlify, keep to fewer than 10 interactive maps in a single RMarkdown document.
5 Geospatial Data wrangling
Although simple feature data frames are becoming increasingly popular compared to the sp package’s Spatial* classes, many geospatial analysis packages still require geospatial data in the sp package’s Spatial* format. In this section, you explore how to convert a simple feature data frame to an sp Spatial* class.
5.1 Convert sf data frames to sp’s Spatial* class
The code chunk below uses the as_Spatial() function from the sf package to convert the three geospatial data from simple feature data frames to sp Spatial* classes.
childcare <- as_Spatial(childcare_sf)
mpsz <- as_Spatial(mpsz_sf)
sg <- as_Spatial(sg_sf)childcareclass : SpatialPointsDataFrame
features : 1545
extent : 11203.01, 45404.24, 25667.6, 49300.88 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 2
names : Name, Description
min values : kml_1, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSPOSTALCODE</th> <td>018989</td> </tr><tr bgcolor=""> <th>ADDRESSSTREETNAME</th> <td>1, MARINA BOULEVARD, #B1 - 01, ONE MARINA BOULEVARD, SINGAPORE 018989</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor=""> <th>DESCRIPTION</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor=""> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>NAME</th> <td>THE LITTLE SKOOL-HOUSE INTERNATIONAL PTE. LTD.</td> </tr><tr bgcolor="#E3E3F3"> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>08F73931F4A691F4</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200826094036</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center>
max values : kml_999, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSPOSTALCODE</th> <td>829646</td> </tr><tr bgcolor=""> <th>ADDRESSSTREETNAME</th> <td>200, PONGGOL SEVENTEENTH AVENUE, SINGAPORE 829646</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor=""> <th>DESCRIPTION</th> <td>Child Care Services</td> </tr><tr bgcolor="#E3E3F3"> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor=""> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>NAME</th> <td>RAFFLES KIDZ @ PUNGGOL PTE LTD</td> </tr><tr bgcolor="#E3E3F3"> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>379D017BF244B0FA</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200826094036</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center>
sgclass : SpatialPolygonsDataFrame
features : 60
extent : 2663.926, 56047.79, 16357.98, 50244.03 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 4
names : GDO_GID, MSLINK, MAPID, COSTAL_NAM
min values : 1, 1, 0, ISLAND LINK
max values : 60, 67, 0, SINGAPORE - MAIN ISLAND
mpszclass : SpatialPolygonsDataFrame
features : 323
extent : 2667.538, 56396.44, 15748.72, 50256.33 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 15
names : OBJECTID, SUBZONE_NO, SUBZONE_N, SUBZONE_C, CA_IND, PLN_AREA_N, PLN_AREA_C, REGION_N, REGION_C, INC_CRC, FMEL_UPD_D, X_ADDR, Y_ADDR, SHAPE_Leng, SHAPE_Area
min values : 1, 1, ADMIRALTY, AMSZ01, N, ANG MO KIO, AM, CENTRAL REGION, CR, 00F5E30B5C9B7AD8, 16409, 5092.8949, 19579.069, 871.554887798, 39437.9352703
max values : 323, 17, YUNNAN, YSSZ09, Y, YISHUN, YS, WEST REGION, WR, FFCCF172717C2EAF, 16409, 50424.7923, 49552.7904, 68083.9364708, 69748298.792
The geospatial data have been converted into their respective sp’s Spatial* classes.
5.2 Convert the Spatial* class into generic sp format
spatstat requires analytical data in ppp object form. There is no direct method to convert Spatial* classes into ppp objects, so we first need to convert Spatial* classes into generic sp objects.
The code chunk below performs this conversion.
childcare_sp <- as(childcare, "SpatialPoints")
sg_sp <- as(sg, "SpatialPolygons")These sp objects properties are displayed as below.
childcare_spclass : SpatialPoints
features : 1545
extent : 11203.01, 45404.24, 25667.6, 49300.88 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
sg_spclass : SpatialPolygons
features : 60
extent : 2663.926, 56047.79, 16357.98, 50244.03 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
Note: The are certain differences between Spatial* classes and generic sp object. Taking SpatialPointsDataFrame (Spatial* classes) and SpatialPoints (generic sp object) as an example:
SpatialPointsclass: used to represent a simple collection of spatial points in a given coordinate system. This class focuses solely on the geometric aspect of spatial data, i.e., the locations of the points.SpatialPointsDataFrameclass: extendsSpatialPointsby combining spatial coordinates with a data frame of attribute data. This class allows you to store both spatial and non-spatial (attribute) data together.
5.3 Convert the generic sp format into spatstat’s ppp format
Next ppp() function of spatstat is used to convert the SpatialPoints object into spatstat’s ppp object using 2 steps:
Extract the point coordinates from the SpatialPoints object.
Define the observation window for the ppp object, usually as a rectangle or polygon encompassing all the points.
# Extract the bounding box and point coordinates from the SpatialPoints object
bbox <- bbox(childcare_sp)
coords <- coordinates(childcare_sp)
# Define the observation window for the ppp object, usually as a rectangle or polygon encompassing all the points.
window <- owin(xrange = bbox[1, ], yrange = bbox[2, ])
# Convert SpatialPoints object to ppp using ppp()
childcare_ppp <- ppp(x = coords[, 1], y = coords[, 2], window = window)Warning: data contain duplicated points
childcare_pppPlanar point pattern: 1545 points
window: rectangle = [11203.01, 45404.24] x [25667.6, 49300.88] units
We plot childcare_ppp and examine the different.
plot(childcare_ppp)
We can see the subzone boundary is not shown and the points are displayed in overlapping characters.
The summary statistics of the newly created ppp object is shown using the code chunk below.
summary(childcare_ppp)Planar point pattern: 1545 points
Average intensity 1.91145e-06 points per square unit
*Pattern contains duplicated points*
Coordinates are given to 11 decimal places
Window: rectangle = [11203.01, 45404.24] x [25667.6, 49300.88] units
(34200 x 23630 units)
Window area = 808287000 square units
Note the warning message about duplicates. In spatial point pattern analysis, duplicates are a significant issue. The statistical methods used for spatial point pattern analysis often assume that the processes are simple, meaning that points should not overlap.
5.4 Handle duplicated points
We can check if the ppp object contain any duplicated point using below code chunk.
any(duplicated(childcare_ppp))[1] TRUE
The multiplicity() function can be used to count the number of co-incident points.
multiplicity(childcare_ppp)The number of locations having more than one point event is counted using the code chunk below.
sum(multiplicity(childcare_ppp) > 1)[1] 128
The output indicates there are 128 duplicated point events.
To visualize the locations of these duplicate points, we plot the childcare data using the code chunk below.
tmap_mode('view')tmap mode set to interactive viewing
tm_shape(childcare) +
tm_dots(alpha=0.4,
size=0.05)tmap_mode("plot")tmap mode set to plotting
Note: As alpha defines the transparency of the dots, locations with darker dots (less transparent) indicates duplication since it have multiple points overlaying the same spot.
There are three ways to address this issue of duplicated points:
The simplest method is to delete the duplicates, but this could result in losing valuable point events.
The second option is to use jittering, which adds a small perturbation to the duplicate points so they no longer occupy the exact same location.
The third approach is to make each point “unique” and attach duplicates as marks or attributes to these points. This requires using analytical techniques that consider these marks.
The code chunk below implements the jittering approach.
childcare_ppp_jit <- rjitter(childcare_ppp,
retry=TRUE,
nsim=1,
drop=TRUE)We check again for duplication.
any(duplicated(childcare_ppp_jit))[1] FALSE
The output is FALSE indicating there are no duplicated point in childcare_ppp_jit
5.5 Create owin object
When analyzing spatial point patterns, it is important to limit the analysis to a specific geographical area, such as the boundary of Singapore. In spatstat, an object called owin is specifically designed to represent such polygonal regions.
The code chunk below converts the sg simple feature object into an owin object for use in spatstat.
sg_owin <- as.owin(sg_sf)Plot the output object using plot() function.
plot(sg_owin)
View the summary using summary() of Base R.
summary(sg_owin)5.6 Combine point events object and owin object
In this final step of geospatial data wrangling, we use the below code chunk to extract childcare events that are located within Singapore boundary.
childcareSG_ppp = childcare_ppp[sg_owin]The output object combines both the point and polygon features into a single ppp object class, as shown below.
summary(childcareSG_ppp)Plot the output object.
plot(childcareSG_ppp)
6 First-order Spatial Point Patterns Analysis
In this section, we explore how to perform first-order Spatial Point Pattern Analysis (SPPA) using spatstat package. The following subsections will focus on:
Deriving a Kernel Density Estimation (KDE) layer to visualize and explore the intensity of point processes.
Conducting Confirmatory Spatial Point Pattern Analysis using Nearest Neighbour statistics.
6.1 Kernel Density Estimation
6.1.1 Compute Kernel Density Estimation using Automatic Bandwidth Selection method
The code chunk below computes a kernel density estimation using spatstat package’s density() function with the following configurations:
Bandwidth selection method: bw.diggle() is used for automatic bandwidth selection. Other recommended methods include bw.CvL(), bw.scott(), or bw.ppl().
Smoothing kernel: The Gaussian kernel is used by default. Other available smoothing methods are “epanechnikov,” “quartic,” and “disc.”
Edge effect bias correction: The intensity estimate is corrected for edge effects using the method described by Jones (1993) and Diggle (2010, equation 18.9). This correction is set to TRUE by default.
kde_childcareSG_bw <- density(childcareSG_ppp,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")We use the plot() function of Base R to display the kernel density derived.
plot(kde_childcareSG_bw)
The density values in the output range from 0 to 0.000035 (the bar on the right hand side), which is too small to interpret easily. This is because the default unit of measurement for svy21 is meters, so the computed density values are in “number of points per square meter.”
The bandwidth used to compute the KDE layer can be retrieved using below code chunk.
bw <- bw.diggle(childcareSG_ppp)
bw sigma
298.4095
6.1.2 Rescale KDE values
We can covert the unit of measurement from meter to kilometer using rescale.ppp().
childcareSG_ppp.km <- rescale.ppp(childcareSG_ppp, 1000, "km")Re-run density() using the resale data set and plot the output kde map.
kde_childcareSG.bw <- density(childcareSG_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
plot(kde_childcareSG.bw)
The output map looks identical to the earlier version, the only difference is the density values range (the legend on the right).
6.2 Work with different Automatic Bandwidth Methods
Apart from bw.diggle(), there are three other spatstat functions that can be used to determine the bandwidth: bw.CvL(), bw.scott(), and bw.ppl().
Let’s examine the bandwidth values returned by these automatic calculation methods using below code chunk.
bw.CvL(childcareSG_ppp.km) sigma
4.543278
bw.scott(childcareSG_ppp.km) sigma.x sigma.y
2.224898 1.450966
bw.ppl(childcareSG_ppp.km) sigma
0.3897114
bw.diggle(childcareSG_ppp.km) sigma
0.2984095
Baddeley et al. (2016) suggested using bw.ppl() algorithm because, in their experience, it tends to produce more appropriate values when the point pattern mainly consists of tight clusters. However, if the goal of a study is to detect a single tight cluster within random noise, the bw.diggle() method is considered by them to be the best choice.
The code chunk below compare the outputs of the bw.diggle() and bw.ppl() methods.
kde_childcareSG.ppl <- density(childcareSG_ppp.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")
par(mfrow=c(1,2))
plot(kde_childcareSG.bw, main = "bw.diggle")
plot(kde_childcareSG.ppl, main = "bw.ppl")
There are no significant difference between the 2 plots. However the output map using bw.ppl seems to have more areas with high density values (more areas colored in the higher value range).
6.3 Work with different kernel methods
By default, the kernel method used in density.ppp() is Gaussian. However, there are three other options: Epanechnikov, Quartic, and Disc.
The code chunk below compute three additional kernel density estimations using these kernel functions and plot the output of all four kernel methods for comparison.
par(mfrow=c(2,2))
sigma_val = bw.ppl(childcareSG_ppp.km)
plot(density(childcareSG_ppp.km,
sigma=sigma_val,
edge=TRUE,
kernel="gaussian"),
main="Gaussian")
plot(density(childcareSG_ppp.km,
sigma=sigma_val,
edge=TRUE,
kernel="epanechnikov"),
main="Epanechnikov")
plot(density(childcareSG_ppp.km,
sigma=sigma_val,
edge=TRUE,
kernel="quartic"),
main="Quartic")
plot(density(childcareSG_ppp.km,
sigma=sigma_val,
edge=TRUE,
kernel="disc"),
main="Disc")
7 Fixed and Adaptive KDE
7.1 Computing KDE using fixed bandwidth
Next we compute a Kernel Density Estimation (KDE) layer by defining a bandwidth of 600 meters. In the below code chunk, the sigma value is set to 0.6 since the unit of measurement of childcareSG_ppp.km object is in kilometers, so 600 meters = 0.6 kilometers.
kde_childcareSG_600 <- density(childcareSG_ppp.km,
sigma=0.6,
edge=TRUE,
kernel="gaussian")
plot(kde_childcareSG_600)
7.2 Compute KDE using adaptive bandwidth
As fixed bandwidth methods is very sensitive to highly skewed distributions of spatial point patterns across different geographical units, such as urban and rural areas. To address this issue, we can use adaptive bandwidth methods.
In this section, we explore how to derive adaptive kernel density estimation using the adaptive.density() function from spatstat.
kde_childcareSG_adaptive <- adaptive.density(childcareSG_ppp.km, method="kernel")
plot(kde_childcareSG_adaptive)
We can compare the fixed and adaptive kernel density estimation outputs using below code chunk.
par(mfrow=c(1,2))
plot(kde_childcareSG.bw, main = "Fixed Bandwith")
plot(kde_childcareSG_adaptive, main = "Adaptive Bandwith")
7.3 Convert KDE output into grid object
The result is the same, the conversion is to make the data suitable for mapping purposes.
The KDE output is in pixel image as shown below.
summary(kde_childcareSG.bw)real-valued pixel image
128 x 128 pixel array (ny, nx)
enclosing rectangle: [2.663926, 56.04779] x [16.35798, 50.24403] km
dimensions of each pixel: 0.417 x 0.2647348 km
Image is defined on a subset of the rectangular grid
Subset area = 726.060565732197 square km
Subset area fraction = 0.401
Pixel values (inside window):
range = [-8.476185e-15, 28.51831]
integral = 1547.222
mean = 2.130982
The KDE output image is converted to SpatialGridDataFrame and plot using below code.
# Convert image output to SpatialGridDataFrame
gridded_kde_childcareSG_bw <- as(kde_childcareSG.bw, "SpatialGridDataFrame")
# Plot the SpatialGridDataFrame
spplot(gridded_kde_childcareSG_bw, main = "Kernel Density Estimate")
7.3.1 Convert gridded output into raster
We convert the gridded kernal density objects into RasterLayer object by using raster() of raster package.
kde_childcareSG_bw_raster <- raster(gridded_kde_childcareSG_bw)Let us take a look at the properties of kde_childcareSG_bw_raster RasterLayer.
kde_childcareSG_bw_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.4170614, 0.2647348 (x, y)
extent : 2.663926, 56.04779, 16.35798, 50.24403 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : -8.476185e-15, 28.51831 (min, max)
Note: the crs property is NA.
7.3.2 Assign projection systems
The code chunk below assigns the CRS information on kde_childcareSG_bw_raster RasterLayer.
projection(kde_childcareSG_bw_raster) <- CRS("+init=EPSG:3414")
kde_childcareSG_bw_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.4170614, 0.2647348 (x, y)
extent : 2.663926, 56.04779, 16.35798, 50.24403 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs
source : memory
names : v
values : -8.476185e-15, 28.51831 (min, max)
The CRS property is completed now.
7.4 Visualize the output in tmap
We display the raster in cartographic quality map using tmap package.
tm_shape(kde_childcareSG_bw_raster) +
tm_raster(palette = "viridis") +
tm_layout(legend.position = c("right", "bottom"), frame = FALSE)
**Note*: The raster values are encoded explicitly onto the raster pixel using the values in “v”” field.
7.5 Compare Spatial Point Patterns using KDE
In this section, we explore how to compare KDE of childcare at Ponggol, Tampines, Chua Chu Kang and Jurong West planning areas.
7.5.1 Extract study area
The below code chunk is used to extract the 4 target planning areas.
pg <- mpsz_sf %>%
filter(PLN_AREA_N == "PUNGGOL")
tm <- mpsz_sf %>%
filter(PLN_AREA_N == "TAMPINES")
ck <- mpsz_sf %>%
filter(PLN_AREA_N == "CHOA CHU KANG")
jw <- mpsz_sf %>%
filter(PLN_AREA_N == "JURONG WEST")Plot the target planning areas
par(mfrow=c(2,2))
plot(pg, main = "Ponggol")Warning: plotting the first 9 out of 15 attributes; use max.plot = 15 to plot
all

plot(tm, main = "Tampines")Warning: plotting the first 9 out of 15 attributes; use max.plot = 15 to plot
all

plot(ck, main = "Choa Chu Kang")Warning: plotting the first 10 out of 15 attributes; use max.plot = 15 to plot
all

plot(jw, main = "Jurong West")Warning: plotting the first 9 out of 15 attributes; use max.plot = 15 to plot
all

7.5.2 Create owin object
We convert these sf objects into owin objects as required by spatstat.
pg_owin = as.owin(pg)
tm_owin = as.owin(tm)
ck_owin = as.owin(ck)
jw_owin = as.owin(jw)7.5.3 Combine childcare points and the study area
We extract childcare centres within the selected regions for further analysis.
childcare_pg_ppp = childcare_ppp_jit[pg_owin]
childcare_tm_ppp = childcare_ppp_jit[tm_owin]
childcare_ck_ppp = childcare_ppp_jit[ck_owin]
childcare_jw_ppp = childcare_ppp_jit[jw_owin]Next, rescale.ppp() function is used to transform the unit of measurement from metre to kilometre.
childcare_pg_ppp.km = rescale.ppp(childcare_pg_ppp, 1000, "km")
childcare_tm_ppp.km = rescale.ppp(childcare_tm_ppp, 1000, "km")
childcare_ck_ppp.km = rescale.ppp(childcare_ck_ppp, 1000, "km")
childcare_jw_ppp.km = rescale.ppp(childcare_jw_ppp, 1000, "km")The code chunk below plot these four study areas and the locations of the childcare centres.
par(mfrow=c(2,2))
plot(childcare_pg_ppp.km, main="Punggol")
plot(childcare_tm_ppp.km, main="Tampines")
plot(childcare_ck_ppp.km, main="Choa Chu Kang")
plot(childcare_jw_ppp.km, main="Jurong West")
7.5.4 Compute KDE
The below code chunk computes the KDE of these four planning area. bw.diggle method is used to derive the bandwidth.
par(mfrow=c(2,2))
plot(density(childcare_pg_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Punggol")Warning: Berman-Diggle Cross-Validation criterion was minimised at right-hand
end of interval [0, 0.245]; use argument 'hmax' to specify a wider interval for
bandwidth 'sigma'
plot(density(childcare_tm_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Tempines")
plot(density(childcare_ck_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Choa Chu Kang")
plot(density(childcare_jw_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="JUrong West")
7.5.5 Compute fixed bandwidth KDE
250m bandwidth is used in below code chunk for comparison purpose.
par(mfrow=c(2,2))
plot(density(childcare_ck_ppp.km,
sigma=0.25,
edge=TRUE,
kernel="gaussian"),
main="Chou Chu Kang")
plot(density(childcare_jw_ppp.km,
sigma=0.25,
edge=TRUE,
kernel="gaussian"),
main="JUrong West")
plot(density(childcare_pg_ppp.km,
sigma=0.25,
edge=TRUE,
kernel="gaussian"),
main="Punggol")
plot(density(childcare_tm_ppp.km,
sigma=0.25,
edge=TRUE,
kernel="gaussian"),
main="Tampines")
8 Nearest Neighbour Analysis
In this section, we perform the Clark-Evans test of aggregation for a spatial point pattern using the clarkevans.test() function from the statspat package.
The hypotheses for the test are:
H0: The distribution of childcare services is randomly distributed.
H1: The distribution of childcare services is not randomly distributed.
A 95% confidence interval will be used for the test.
8.1 Clark and Evans Test
clarkevans.test(childcareSG_ppp,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"),
nsim=99)
Clark-Evans test
No edge correction
Z-test
data: childcareSG_ppp
R = 0.55631, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
As p-value is smaller than 0.05, we can reject the null hypothesis and infer that the distribution of childcare centres in Singapore is not random but rather clustered (due to alternative=“clustered”).
We use clarkevans.test() to performs Clark-Evans test of aggregation for childcare centre in Choa Chu Kang planning area.
clarkevans.test(childcare_ck_ppp,
correction="none",
clipregion=NULL,
alternative=c("two.sided"),
nsim=999)
Clark-Evans test
No edge correction
Z-test
data: childcare_ck_ppp
R = 0.98729, p-value = 0.8494
alternative hypothesis: two-sided
As p-value is larger than 0.05, we cannot reject the null hypothesis that the distribution of childcare centre in Choa Chu Kang is randomly distributed.
We use clarkevans.test() to performs Clark-Evans test of aggregation for childcare centre in Tampines planning area.
clarkevans.test(childcare_tm_ppp,
correction="none",
clipregion=NULL,
alternative=c("two.sided"),
nsim=999)
Clark-Evans test
No edge correction
Z-test
data: childcare_tm_ppp
R = 0.79124, p-value = 0.0001648
alternative hypothesis: two-sided
As p-value is smaller than 0.05, we can reject the null hypothesis and infer that the distribution of childcare centres in Tampines is not random but either have a clustered or regular pattern (due to alternative=“two.sided”).
The below density map shows that the distribution in Tampines is rather clustered.
plot(density(childcare_tm_ppp.km,
sigma=0.25,
edge=TRUE,
kernel="gaussian"),
main="Tampines")
We use clarkevans.test() to performs Clark-Evans test of aggregation for childcare centre in Jurong West planning area.
clarkevans.test(childcare_jw_ppp,
correction="none",
clipregion=NULL,
alternative=c("two.sided"),
nsim=999)
Clark-Evans test
No edge correction
Z-test
data: childcare_jw_ppp
R = 0.90301, p-value = 0.08177
alternative hypothesis: two-sided
As p-value is larger than 0.05, we cannot reject the null hypothesis that the distribution of childcare centre in Jurong West is randomly distributed.
We use clarkevans.test() to performs Clark-Evans test of aggregation for childcare centre in Punggol planning area.
clarkevans.test(childcare_pg_ppp,
correction="none",
clipregion=NULL,
alternative=c("two.sided"),
nsim=999)
Clark-Evans test
No edge correction
Z-test
data: childcare_pg_ppp
R = 0.88762, p-value = 0.09314
alternative hypothesis: two-sided
As p-value is larger than 0.05, we cannot reject the null hypothesis that the distribution of childcare centre in Punggol is randomly distributed.